Acquisition of Translation Lexicons for Historically Unwritten Languages via Bridging Loanwords
نویسندگان
چکیده
With the advent of informal electronic communications such as social media, colloquial languages that were historically unwritten are being written for the first time in heavily code-switched environments. We present a method for inducing portions of translation lexicons through the use of expert knowledge in these settings where there are approximately zero resources available other than a language informant, potentially not even large amounts of monolingual data. We investigate inducing a Moroccan DarijaEnglish translation lexicon via French loanwords bridging into English and find that a useful lexicon is induced for humanassisted translation and statistical machine translation.
منابع مشابه
Stochastic Language Models for Automatic Acquisition of Lexicons from Printed Bilingual Dictionaries
Electronic bilingual lexicons are crucial for machine translation, cross-lingual information retrieval and speech recognition. For low-density languages, however, the availability of electronic bilingual lexicons is questionable. One solution is to acquire electronic lexicons from printed bilingual dictionaries. While manual data entry is a possibility, automatic acquisition of lexicons from sc...
متن کاملLEXICALL: Lexicon Construction for Foreign Language Tutoring
We focus on the problem of building large repositories of lexical conceptual structure (LCS) representations for verbs in multiple languages. One of the main results of this work is the deenition of a relation between broad semantic classes and LCS meaning components. Our acquisition program|LEXICALL|takes, as input, the result of previous work on verb classiication and thematic grid tagging, a...
متن کاملSegmentation and Translation of Japanese Multi-word Loanwords
The Japanese language has absorbed large numbers of loanwords from many languages, in particular English. As well as using single loanwords, compound nouns, multiword expressions (MWEs), etc. constructed from loanwords can be found in use in very large quantities. In this paper we describe a system which has been developed to segment Japanese loanword MWEs and construct likely English translati...
متن کاملSemi-automatic Acquisition of Domain-speciic Translation Lexicons
We investigate the utility of an algorithm for translation lexicon acquisition (SABLE), used previously on a very large corpus to acquire general translation lexicons , when that algorithm is applied to a much smaller corpus to produce candidates for domain-speciic translation lexicons.
متن کاملCreating Multilingual Translation Lexicons with Regional Variations Using Web Corpora
The purpose of this paper is to automatically create multilingual translation lexicons with regional variations. We propose a transitive translation approach to determine translation variations across languages that have insufficient corpora for translation via the mining of bilingual search-result pages and clues of geographic information obtained from Web search engines. The experimental resu...
متن کامل